37 research outputs found
Attention Mechanism for Recognition in Computer Vision
It has been proven that humans do not focus their attention on an entire scene at once when they perform a recognition task. Instead, they pay attention to the most important parts of the scene to extract the most discriminative information. Inspired by this observation, in this dissertation, the importance of attention mechanism in recognition tasks in computer vision is studied by designing novel attention-based models. In specific, four scenarios are investigated that represent the most important aspects of attention mechanism.First, an attention-based model is designed to reduce the visual features\u27 dimensionality by selectively processing only a small subset of the data. We study this aspect of the attention mechanism in a framework based on object recognition in distributed camera networks. Second, an attention-based image retrieval system (i.e., person re-identification) is proposed which learns to focus on the most discriminative regions of the person\u27s image and process those regions with higher computation power using a deep convolutional neural network. Furthermore, we show how visualizing the attention maps can make deep neural networks more interpretable. In other words, by visualizing the attention maps we can observe the regions of the input image where the neural network relies on, in order to make a decision. Third, a model for estimating the importance of the objects in a scene based on a given task is proposed. More specifically, the proposed model estimates the importance of the road users that a driver (or an autonomous vehicle) should pay attention to in a driving scenario in order to have safe navigation. In this scenario, the attention estimation is the final output of the model. Fourth, an attention-based module and a new loss function in a meta-learning based few-shot learning system is proposed in order to incorporate the context of the task into the feature representations of the samples and increasing the few-shot recognition accuracy.In this dissertation, we showed that attention can be multi-facet and studied the attention mechanism from the perspectives of feature selection, reducing the computational cost, interpretable deep learning models, task-driven importance estimation, and context incorporation. Through the study of four scenarios, we further advanced the field of where \u27\u27attention is all you need\u27\u27
Context Aware Road-user Importance Estimation (iCARE)
Road-users are a critical part of decision-making for both self-driving cars
and driver assistance systems. Some road-users, however, are more important for
decision-making than others because of their respective intentions, ego
vehicle's intention and their effects on each other. In this paper, we propose
a novel architecture for road-user importance estimation which takes advantage
of the local and global context of the scene. For local context, the model
exploits the appearance of the road users (which captures orientation,
intention, etc.) and their location relative to ego-vehicle. The global context
in our model is defined based on the feature map of the convolutional layer of
the module which predicts the future path of the ego-vehicle and contains rich
global information of the scene (e.g., infrastructure, road lanes, etc.), as
well as the ego vehicle's intention information. Moreover, this paper
introduces a new data set of real-world driving, concentrated around
inter-sections and includes annotations of important road users. Systematic
evaluations of our proposed method against several baselines show promising
results.Comment: Published in: IEEE Intelligent Vehicles (IV), 201
FIR-based Future Trajectory Prediction in Nighttime Autonomous Driving
The performance of the current collision avoidance systems in Autonomous
Vehicles (AV) and Advanced Driver Assistance Systems (ADAS) can be drastically
affected by low light and adverse weather conditions. Collisions with large
animals such as deer in low light cause significant cost and damage every year.
In this paper, we propose the first AI-based method for future trajectory
prediction of large animals and mitigating the risk of collision with them in
low light. In order to minimize false collision warnings, in our multi-step
framework, first, the large animal is accurately detected and a preliminary
risk level is predicted for it and low-risk animals are discarded. In the next
stage, a multi-stream CONV-LSTM-based encoder-decoder framework is designed to
predict the future trajectory of the potentially high-risk animals. The
proposed model uses camera motion prediction as well as the local and global
context of the scene to generate accurate predictions. Furthermore, this paper
introduces a new dataset of FIR videos for large animal detection and risk
estimation in real nighttime driving scenarios. Our experiments show promising
results of the proposed framework in adverse conditions. Our code is available
online.Comment: Conference: IEEE Intelligent Vehicles 2023 (IEEE IV 2023
Using simulation to quantify the performance of automotive perception systems
The design and evaluation of complex systems can benefit from a software
simulation - sometimes called a digital twin. The simulation can be used to
characterize system performance or to test its performance under conditions
that are difficult to measure (e.g., nighttime for automotive perception
systems). We describe the image system simulation software tools that we use to
evaluate the performance of image systems for object (automobile) detection. We
describe experiments with 13 different cameras with a variety of optics and
pixel sizes. To measure the impact of camera spatial resolution, we designed a
collection of driving scenes that had cars at many different distances. We
quantified system performance by measuring average precision and we report a
trend relating system resolution and object detection performance. We also
quantified the large performance degradation under nighttime conditions,
compared to daytime, for all cameras and a COCO pre-trained network